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ABSTRACT 



A method for controlling a set of services in a cluster 
computer system. The set of services is registered with a 
service controller in the cluster computer system. The set of 
services is monitored for a failure of a service within the set 
of services. In response to a failure of the service, a failure 
sequence is initiated. An appropriate start sequence is initi- 
ated when the failed service can be restarted. 
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BACKGROUND OF THE INVENTION 

1. Technical Field 

The present invention relates generally to an improved 
data processing system, and in particular to an improved 
method and apparatus for managing services. Still more 10 
particularly, the present invention relates to a method and 
apparatus for managing services in a cluster computer 
system. 

2. Description of Related Art J5 
Internet, also referred to as an "internetwork", in com- 
munications is a set of computer networks, possibly 
dissimilar, joined together by means of gateways that handle 
data transfer and the conversion of messages from the 
sending network to the protocols used by the receiving 20 
network (with packets if necessary). When capitalized, the 
term "Internet" refers to the collection of networks and 
gateways that use the TCP/IP suite of protocols. TCP/IP 
stands for Transmission Control Protocol/Internet Protocol. 
This protocol was developed by the Department of Defense ^ 
for communications between computers. It is built into the 
UNIX system and has become the de facto standard for data 
transmission over networks, including the Internet. 

The Internet has become a cultural fixture as a source of 
both information and entertainment. Many businesses are 30 
creating Internet sites as an integral part of their marketing 
efforts, informing consumers of the products or services 
offered by the business or providing other information 
seeking to engender brand loyalty. Many federal, state, and 
local government agencies are also employing Internet sites 35 
for informational purposes, particularly agencies which 
must interact with virtually all segments of society such as 
the Internal Revenue Service and secretaries of state. Oper- 
ating costs may be reduced by providing informational 
guides and/or searchable databases of public records online. 40 

Currently, the most commonly employed method of trans- 
ferring data over the Internet is to employ the World Wide 
Web environment, also called simply "the web". Other 
Internet resources exist for transferring information, such as 
File Transfer Protocol (FTP) and Gopher, but have not 45 
achieved the popularity of the web. In the web environment, 
servers and clients effect data transaction using the Hyper- 
text Transfer Protocol (HTTP), a known protocol for han- 
dling the transfer of various data files (e.g., text, still graphic 
images, audio, motion video, etc.). Information is formatted so 
for presentation to a user by a standard page description 
language, the Hypertext Markup Language (HTML). In 
addition to basic presentation formatting, HTML allows 
developers to specify "links" to other web resources, includ- 
ing web sites, identified by a Uniform Resource Locator 55 
(URL). A URL is a special syntax identifier defining a 
communications path to specific information. Each logical 
block of information accessible to a client, called a "page" 
or a "web page", is identified by a URL. The URL provides 
a universal, consistent method for finding and accessing this 60 
information by the web "browser". A browser is a program 
capable of submitting a request for information identified by 
a URL at the client machine. Retrieval of information on the 
web is generally accomplished with an HTML-compatible 
browser, such as, for example, Netscape Communicator, 65 
which is available from Netscape Communications Corpo- 
ration. 
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A web site is typically located on a server, which in some 
cases may support multiple web sites. In providing infor- 
mation to various users across the Internet, cluster computer 
systems are often used to provide adequate bandwidth for 
transmitting and receiving information. Sometimes the ser- 
vices on a cluster computer system may fail and require one 
or more servers within the cluster computer system to be 
restarted. It would be advantageous to have a method and 
apparatus for managing, starting, stopping, and restarting of 
services within a cluster computer system. 

SUMMARY OF THE INVENTION 

The present invention provides a method for controlling a 
set of services in a cluster computer system. The set of 
services is registered with a service controller in the cluster 
computer system. The set of services is monitored for a 
failure of a service within the set of services. In response to 
a failure of the service, a failure sequence is initiated. An 
appropriate start sequence is initiated when the failed service 
can be restarted. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The novel features believed characteristic of the invention 
are set forth in the appended claims. The invention itself, 
however, as well as a preferred mode of use, further objec- 
tives and advantages thereof, will best be understood by 
reference to the following detailed description of an illus- 
trative embodiment when read in conjunction with the 
accompanying drawings, wherein: 

FIG. 1 is a pictorial representation of a distributed data 
processing system in which the present invention may be 
implemented; 

FIG. 2 is a block diagram of a data processing system, 
which may be implemented as a server, in accordance to the 
present invention; 

FIG. 3 is a diagram of a server system in the form of a 
cluster computer system in accordance with a preferred 
embodiment of the present invention; 

FIG. 4 is a diagram of components used in managing 
services in accordance with a preferred embodiment of the 
present invention; 

FIG. 5 is a diagram illustrating the states for cluster 
coordinator components in accordance with a preferred 
embodiment of the present invention; 

FIG, 6 is a flowchart of a process for registering services 
in accordance with a preferred embodiment of the present 
invention; 

FIGS. 7A-7B are a flowchart of a process for initializing 
services in accordance with a preferred embodiment of the 
present invention; 

FIG. 8 is a flowchart of a process for monitoring and 
handling service failures in accordance with a preferred 
embodiment of the present invention; 

FIG. 9 is a flowchart of a process for stopping services in 
response to a stop command sent to a cluster coordinator 
manager in accordance with a preferred embodiment of the 
present invention; 

FIG. 10 is a flowchart of a process for creating a standby 
cluster coordinator daemon in accordance with a preferred 
embodiment of the present invention; 

FIG. 11 is a flowchart of a process for restarting services 
in accordance with a preferred embodiment of the present 
invention; 

FIG. 12 is a flowchart illustrating the mechanism used by 
a service to shutdown in accordance with a preferred 
embodiment of the present invention; and 
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FIGS. 13A-13C are illustrations of entries in a configu- bus 216. A number of modems 218-220 may be connected 

ration file in accordance with a preferred embodiment of the to PCI local bus 216. Typical PCI bus implementations will 

present invention. support four PCI expansion slots or add-in connectors. 

Communications links to network computers 108-112 in 

DETAILED DESCRIPTION OF THE 5 FIG. 1 may be provided through modem 218 and network 

PREFERRED EMBODIMENT adapter 220 connected to PCI local bus 216 through add-in 

With reference now to the figures, and in Particular with Additional PCI bus bridges 222 and 224 provide inter- 
reference to FIG. 1, a pictorial representation of a distributed ^ fa additional pa ^ ^ ^ 22 £ tom which 
data processing system in which the present invention may addilional modems Qr netWQrk ^ be rted> 
be implemented is depicted. io m ^ marm ^ seryer 2m ooimeclions to multiple 

Distributed data processmg system 100 is a network of network computers. A memory mapped graphics adapter 

computers in which the present invention may be imple- 230 and hard disk 232 may also be connected to I/O bus 212 

mented. Distributed data processing system 100 contains a as depicted, either directly or indirectly, 

network 102, which is the medium used to provide commu- Those of ordinary skill in the art will appreciate that the 

nications links between various devices and computers hardware depicted in FIG. 2 may vary. For example, other 

connected together within distributed data processing sys- peripheral devices, such as optical disk drive and the like 

tern 100. Network 102 may include permanent connections, ^ may be used in addition or in place of the hardware 

such as wire or fiber optic cables, or temporary connections depicted. The depicted example is not meant to imply 

made through telephone connections. architectural limitations with respect to the present inven- 

In the depicted example, a server system 104 is connected tion. 

to network 102 along with storage unit 106. Server system The data processing system depicted in FIG. 2 may be, for 

104 may include one or more servers connected to each example, an IBM RISC/System 6000 system, a product of 

other in the depicted example. When more than one server International Business Machines Corporation in Armonk, 

is located in server system 104, the system is referred to as N.y, running the Advanced Interactive Executive (AIX) 

a cluster computer system. In addition, clients 108, 110, and operating system. 

112 also are connected to a network 102. TTiese clients 108, with re f ere nce now to FIG. 3, a diagram of a server 
110, and 112 may be, for example, personal computers or system in the form of a cluster computer system is depicted 
network computers. For purposes of this application, a ^ accord ance with a preferred embodiment of the present 
network computer is any computer, coupled to a network, 3Q invendon. In accordance with a preferred embodiment of the 
which receives a program or other application from another prescnt mvent ion, the processes of the present invention 
computer coupled to the network. In the depicted example, may be implemented to manage processes in a cluster 
server system 104 provides data, such as boot files, operat- computer system, as illustrated in FIG. 3. Server system 104 
ing system images, and applications to clients 108-112. tom p]Q i in me depicted example is a cluster computer 
Clients 108, 110, and 112 are clients to server system 104. 35 system configured with a router 300, a load balancing data 
Distributed data processing system 100 may include addi- processmg system 302 and servers 304-308. Corresponding 
Uonal servers, clients, and other devices not shown. In the rc fe ren ce numbers in different figures represent correspond- 
depicted example, distributed data processing system 100 is mg componcnts unless S p ec ified otherwise. Router 300 
the Internet with network 102 representing a worldwide receivcs pa c kcts destined for server system 104 from net- 
collection of networks and gateways that use the TCP/IP work 102 ^ balancing data processing system 302 
suite of protocols to communicate with one another. At the routes packets reccivcd by route r 300 to an appropriate 
heart of the Internet is a backbone of high-speed data server from 3a4 -308. In the depicted example, load 
communication lines between major nodes or host balancing data processing system 302 employs load balanc- 
computers, consisting of thousands of commercial, ing processes to maximize efficiency in processing requests 
government, educational, and other computer systems, that 45 from various clicnts 0nc 0f more of thc servefS in sefvers 
route data and messages. Of course, distributed data pro- 304-308 may implement the processes of the present inven- 
cessing system 100 also may be implemented as a number tion> ^^vs may be implemented using a server such 
of different types of networks, such as for example, an ^ data processing system 200 in FIG. 2. The server system 
intranet or a local area network. illustrated in FIG. 3 is not intended to imply architectural 

FIG. 1 is intended as an example, and not as an architec- 50 limitations to a server system implementation of the present 

tural limitation for the processes of the present invention. invention. 

Referring to FIG. 2, a block diagram of a data processing The present invention provides a method, apparatus, and 
system, which may be implemented as a server, is depicted instructions for managing services within a cluster computer 
in accordance to the present invention. In the instance that system. In particular, the present invention provides a cluster 
server system 104 is implemented as a single server, data 55 coordinator manager that may spawn a cluster coordinator 
processing system 200 may be used as the server. Data daemon to startup, stop, and restart a set of services provided 
processing system 200 may be a symmetric multiprocessor by the cluster computer system. The cluster coordinator 
(SMP) system including a plurality of processors 202 and manager may start up through a cluster coordinator daemon 
204 connected to system bus 206. Alternatively, a single a set of services that have been registered with the cluster 
processor system may be employed. Also connected to 60 coordinator manager. The information on the registered 
system bus 206 is memory controller/cache 208, which services may be found in a data structure such as a computer 
provides an interface to local memory 209. I/O bus bridge configuration file (i.e., cscomputer.cfg). This configuration 
210 is connected to system bus 206 and provides an interface file contains information, such as startup sequence, shut- 
to I/O bus 212. Memory controller/cache 208 and I/O bus down sequence, time out information, and path names for 
bridge 210 may be integrated as depicted. 65 the registered services. 

Peripheral component interconnect (PCI) bus bridge 214 In the depicted examples, a cluster coordinator provides a 

connected to I/O bus 212 provides an interface to PCI local facility for starting, stopping, and restarting all services 
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provided by a cluster computer system. Acluster coordinator With reference now to FIG. 5, a diagram illustrating the 

resides on each computer within a cluster computer system. states for cluster coordinator components is depicted in 

The cluster coordinator first started. The cluster coordinator accordance with a preferred embodiment of the present 

will bring up other services in the appropriate order and will invention. State machine 500 represents states for the cluster 

monitor each of the services and provide the necessary 5 coordinator components. State machine 500 begins in an 

restart in the event of failure of one or more services within initialization state 502 in which all cluster service configu- 

the cluster computer system. Failure of a service on another ration information, is obtained from the configuration file. In 

computer may be detected through a cluster service that addition, in the initialization state, the services are started in 

monitors services on other computers within the cluster the order specified in the start up sequence. After the 

computer system. 10 services have been successfully started, state machine 500 

With reference now to FIG. 4, a diagram of components shifts to steady state 504 in which monitoring for a failure 

used in managing services is depicted in accordance with a of services or of the cluster coordinator daemon occurs. If a 

preferred embodiment of the present invention. In managing failure occurs, the state machine 500 shifts into recovery 

services, the present invention cluster coordinator compo- state 506 in which services are restarted. If the cluster 

nents 400, cluster coordinator manager 402, cluster coordi- ^ coordinator daemon fails first, the services will be shutdown 

nator daemon 404, and standby cluster coordinator daemon in recovery state 506, and the standby cluster coordinator 

406 are all used to manage services 408. Cluster coordinator daemon will become the cluster coordinator daemon and 

manager 402, cluster coordinator daemon 404, and standby restart the services after the services have been shutdown, 

cluster coordinator daemon 406 provide the ability to start, State machine 500 may shift into stop state 508 in response 

stop, and restart services 408 operating on servers on a 20 t0 a st0 P command being used with the cluster coordinator 

cluster computer system. Services 408 are services that have manager or in response to a failure to restart services 

been registered for use with cluster coordinator components properly in recovery state 506. As can be seen, this state may 

400. Services 408 are monitored for failures, and when a be entered by any other state in state machine 500. Stop state 

service fails, the components are employed to restart ser- 508 results in the services being shutdown and the other 

vices 408 in the proper order. Cluster coordinator manager 25 cluster coordinator components being terminated. 

402 is an executable program responsible for starting, The processes are managed by the cluster coordinator 

stopping, or restarting services 408. The program includes a components using shared mutexes to detect the process 

start command used to start services in the proper order status between state transitions. A mutex is a mechanism for 

based on a start up sequence listed in a configuration file. In providing exclusive access control for resources between 

addition, a stop command is provided to shutdown services 30 processes. Through the use of mutexes, processes may 

408 in an order specified by a shutdown sequence in the access and lock a shared resource, such as a shared memory, 

configuration file. A restart command is used to restart all of Amanager mutex is created when a start command is issued, 

the services using the stop and start commands. This mutex is locked during the startup process and is 

Cluster coordinator daemon 404 is spawned off as a unlocked after the startup process is completed. This mutex 
daemon process when a start command is used to cluster 35 is not generated in response to a stop command, A daemon 
coordinator manager 402. Cluster coordinator daemon 404 mutex is created and locked while the cluster coordinator 
spawns off or starts all of the services based on the order daemon is spawned off. This daemon mutex is unlocked if 
listed the startup sequence in the configuration file. In the cluster coordinator daemon is terrninated. The present 
starting these services, cluster coordinator daemon 404 also invention also employs a daemon component mutex, which 
passes a first argument to each of the cluster services during 40 is created and locked while the cluster coordinator daemon 
the start up process. In the depicted example, this first is spawned. The daemon component mutex is shared with all 
argument is "-cc", which is used to active features or code cluster services. This mutex is unlocked during the shut- 
that is used by services 408 for interaction with cluster down state or recovery state. The daemon component mutex 
coordinator components 400. In addition, cluster coordina- is used to allow the cluster services to shutdown if the cluster 
tor daemon 404 monitors services 408. Services 408 are 45 coordinator daemon process is terminated. A component 
services that are registered with cluster coordinator compo- mutex is created and locked for each cluster service that is 
nents 400. In addition, cluster coordinator daemon 404 is spawned by the cluster coordinator daemon. This mutex is 
responsible for stopping and restarting services 408 in the used to detect the status of the service and is unlocked if the 
proper order when any service failure occurs. Standby cluster service is terminated. A shared memory mutex is 
cluster coordinator daemon 406 is present and monitors 50 used by the services to access a shared memory structure. A 
cluster coordinator daemon 404 for a failure or absence of daemon standby mutex is shared by the cluster coordinator 
this component. If cluster coordinator daemon 404 fails, the daemon and the standby cluster coordinator daemon. This 
services will be shutdown. Then the services will be mutex is used by the standby cluster coordinator daemon to 
restarted with standby cluster coordinator daemon 406 detect the presence of the cluster coordinator daemon, 
becoming the cluster coordinator daemon. 55 With reference now to FIG. 6, a flowchart of a process for 

Services 408 include a number of different services registering services is depicted in accordance with a pre- 

executing on a server in a cluster computer system. These ferred embodiment of the present invention. The process 

services include services that provide databases that contain begins by providing information for use by the cluster 

configuration and status information for the cluster computer configuration manager (step 602). This information may 

system. In addition, services 408 also include services used 60 include, for example, an executable program name for use 

in managing a cluster computer system. These services by the cluster coordinator daemon to start up the process or 

include monitoring activity of services on other computers time out values for the start up and shutdown sequence. All 

within the cluster computer system and allowing the addi- services started by the cluster coordinator daemon have a 

tion and removal of computers from the cluster computer specific amount of time to perform all required initialization 

system. Services 408 also may include applications provid- 65 and registration with the cluster coordinator daemon. If the 

ing services to users, such as, for example, an E-Mail service does not respond within the specified amount of 

program or a spreadsheet program. time, this service is terminated by the cluster coordinator 



01/29/2004, EAST Version: 1.4.1 



US 6,467 ; 

7 

daemon. This step helps avoid situations in which a com- 
ponent hangs and the cluster coordinator daemon has to wait 
for an infinite amount of time. 

Thereafter, the first argument is retrieved (step 604). A 
determination is made as to whether the first argument is a 5 
"-cc" (step 605). In the depicted example, the first argument 
is a "-cc" from the main argument, which is provided as an 
indication that the service is to interface with the cluster 
coordinator components. If the first argument is a "-cc", the 
alive API is invoked (step 606). This API is used to indicate 1Q 
to the cluster coordinate components that the service is up 
and running. This API provides a shutdown routine for 
callback to shutdown the service when the cluster coordi- 
nator daemon decides to shutdown services. Thereafter, a 
determination is made as to whether a return code indicates 15 
that this API is correctly executing (step 608). If the API 
return code indicates that the API is okay, the process then 
invokes the ready API (step 610). This API is invoked after 
initialization of the service is complete and it is time to begin 
interacting with other services on the server computer. A 2 q 
determination is then made as to whether the return code 
indicates that the ready API is executing (step 612). If the 
return code indicates that the ready API is executing, the 
service then executes normally to provide services (step 
614) with the registration process terminating thereafter. 25 
With reference again to steps 608 and 612, if either of the 
return codes do not indicate that APIs are executing 
normally, the services terminate. Steps 600 and 602 only 
need to be performed the first time that the service is 
registered with the cluster coordinator components. These 30 
steps are not actually performed by the services but are steps 
used to include information within the configuration file. 

When a failure other than services executing on various 
computers within the cluster computer system occurs, each 
service is responsible for taking appropriate actions when- 35 
ever these events occur. For example, a failure of a service 
on another computer within the cluster computer system will 
cause a restart of all services on all of the computers within 
the cluster computer system. 

Turning now to FIG. 7A, a flowchart of a process for 40 
initializing services in response to a start command is 
illustrated in accordance with a preferred embodiment of the 
present invention. This process is used during the initializa- 
tion state. The process begins by reading a configuration file 
(step 700) to obtain start up and shutdown information. A 45 
determination is made as to whether the configuration file is 
okay (step 702). If the configuration file is not okay, an error 
code is returned to indicate a configuration error (step 704) 
with the process terminating thereafter. If the configuration 
file is usable, the shared memory is initialized and created 50 
and mutexes are created for the cluster coordinator daemon, 
the standby cluster coordinator, and registered services (step 
706). A mutex indicates to potential users of the shared 
resource whether the resource is in use and prevents access 
by more than one user. 55 

Next, configuration information is copied to the shared 
memory (step 708). The cluster coordinator daemon is 
spawned (step 710). A determination is then made as to 
whether the cluster coordinator daemon was successfully 
spawned (step 712). If the cluster coordinator daemon is not 60 
running, the process proceeds to cleanup the memory and 
mutexes (step 713) and then proceeds to step 704 to return 
an error code. Otherwise, the process waits for the cluster 
coordinator daemon to spawn all of the registered services 
(step 714). Thereafter, a determination is made as to whether 65 
all registered services have been successfully started and are 
running (step 716). If the services are not running, the 
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process proceeds to step 713 as described above. Otherwise, 
an indication is returned that the start command has been 
successfully completed (step 718) with the start process 
terminating thereafter. 

With reference now to FIG. 7B, a flowchart of a process 
for starting services by the cluster coordinator daemon is 
depicted in accordance with a preferred embodiment of the 
present invention. The process begins by locking the shared 
memory (step 720). This step is used to prevent access to the 
shared memory to processes other than the cluster coordi- 
nator daemon. Next, acquire the daemon component mutex 
(step 722). The mutex is used in the recovery state to 
shutdown and restart of services. A service is then started 
using the startup sequence in the shared memory (step 724). 
A determination is then made as to whether any service 
started has failed (step 726). If no services have failed, a 
determination is made as to whether the service started in 
step 726 is the last service in the startup sequence (step 728). 
If more service in the sequence has not been started, the 
process then returns to step 724. Otherwise, the standby 
cluster coordinator daemon is spawned (step 730). Then, a 
determination is made as to whether the standby cluster 
coordinator daemon was successfully started (step 732). If 
the answer to this determination is yes, the process then 
unlocks the shared memory (step 734) with the process 
proceeding to step 716 in FIG. 7 A- 

With reference again to step 726, if a service has failed 
during the startup of services, all of the services that have 
been started are shutdown using the shutdown sequence if 
possible (step 736) with the process then proceeding to step 
716 in FIG. 7A 

Turning now to FIG. 8, a flowchart of a process used in 
the steady state and recovery state by a cluster coordinator 
daemon is illustrated in accordance with a preferred embodi- 
ment of the present invention. This process is employed by 
a cluster coordinator daemon to monitor the services and 
provide shutdown and restarting of services in the event a 
service fails. The process begins by monitoring services by 
a failure (step 800). A determination is made as to whether 
a failure has been detected (step 802). If a failure of a service 
is absent, the process will return to step 800. Otherwise, the 
daemon component mutex is released, which allows the 
services to start accessing the shared memory and shutdown 
in the proper order based on the shutdown sequence infor- 
mation from the configuration file stored in the shared 
memory (step 804). Thereafter, a determination is made as 
to whether the services were shutdown in the proper order 
(step 806). If the services were not shutdown in the proper 
order, all of registered services are killed or terminated (step 
808). Then, a determination is made as to whether the 
number of retries has been exceeded (step 810). The number 
of retries is selected to terminate attempts to restart the 
services when a successful restart is unlikely. The process 
proceeds directly to step 810 from step 806 when the 
services have been shutdown in the proper order. If the 
number of retries has not been exceeded, the daemon 
component mutex for the cluster coordinator daemon is 
acquired by the cluster coordinator daemon (step 812). 
Thereafter, a determination is made as to whether the 
services have been restarted in the proper order (step 814). 
If the services have been restarted in the proper order based 
on the startup sequence from the configuration file, the 
process then returns to step 800 as described above. 
Otherwise, the process returns to step 804 to release the 
daemon component mutex. 

With reference again to step 810, if the number of retries 
has been exceeded, the standby cluster coordinator daemon 
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is shutdown (step 816). Then, cleanup occurs and the cluster If the standby cluster coordinator daemon goes away 

coordinator daemon is terminated (step 818) with the pro- during the suspended state, the suspended state wakes the 

cess terminating thereafter. primary cluster coordinator daemon. The primary cluster 

With reference now to FIG. 9, a flowchart of a process for coordinator daemon responds by spawning another standby 

stopping services in response to a stop command sent to a 5 cluster coordinator daemon. 

cluster coordinator manager is depicted in accordance with Tumin nQW to mQ u a flowchart Crating the 

a preferred embodiment of the present invention. TJe pro- mechanism used b a to shutdown is d icted ^ 

cess begins by checking the snared memory (step 900). A , . iU J c « , . * ., r 

, . P r j . , it _ l / accordance with a preferred embodiment of the present 

determination is made as to whether the shared memory . • r*, . • . • 

■ * / * ftAi\ jc *u u a j ■ < invention. The process begins by a service acquiring the 

exists (step 902). If the shared memory does not exist, an in , * * T* ;. MS ~ . . ? t . 

• t j /, ftft A J . 10 daemon component mutex (step 1200). Each service that is 

error is returned (step 904) with the process terminating • . f 1 : ■ . ... 

tu r* n*u • 11 *u * i_ 1 j / * initialized includes a suspended thread to acquire the mutex 

thereafter. Otherwise, all the mutexes are checked (step - it _ . . , • 4 . 

nAA rp, j . . j * i_ *»_ 11 *i_ f° r tae shutdown information. If one service acquires the 

906). Inen, a determination is made as to whether all the , .... 

' t . f . nno\ ifiL * daemon component mutex, other services are blocked from 

manager mutex is present (step 908). If the manager mutex . . it f , ^ ' 

is present, the cluster coordinator manager is shutdown (step 15 ^ mn ^ ™ tex - ™ e then locks ^ 

910), and then the cluster coordinator daemon is shutdown 15 memory ^ ch ?° ^ f' ^ 'T** I T 

(step 912). Next, the standby cluster coordinator daemon is memory . < Step m2 >' Alternation is made as to whether 

shutdown (step 914). Then, the process makes sure that all *e service accessmg the shared memory * the first service 

of the registered services are shutdown (step 916), and the S shutdown sequence in the shared memory (step 1204). 

shared memory and mutexes are cleaned up (step 918). M " tbe . service h " ™ fi f rSt Se ™ C f.' P^ss then 

Afterwards, an indication of the successful completion of 2 ° det< " s »*«ter the ^ llst f, d " 

the stop command is returned (step 920). This command memor y. 15 01 e * ecu,m S < ste P P**>- K A«« 

u j -r *i_ i . j • j service is not still alive, a determination is then made as to 

may be used if the cluster coordinator daemon is removed . « . . , . , 

and the services cannot be removed while the cluster coor- ^ er lhe ^"'f^t °- ! 

dinator daemon is absent. This command is used to notify « f h hutdo 7 x *™ c(: < s e P ™*), If the current se ™ ce u ^ 

tu* o*r,„™ _j 25 the next service on the shutdown sequence, the shared 

the services to clean up and exit. . , , , 4 ,7 , ' 

f ... riA o 11 lL . memory is unlocked (step 1210), and the daemon compo- 

With reference again to step 908, if all the mutexes are not nMf _ "\ ■ _ A \. * 101 A . 

■ . . 4 . j . it , , nent mutex is released (step 1212). Thereafter, other services 

present, a determination is made as to whether the daemon n j 4 . A. * /* iL . L 

* . ,/ 4 iwm\ lfiL j 316 allowed to acquire the mutex (step 1214) with the 

mutex is present (step 922). If the daemon mutex is present, ' ? v , r .... 

j a 4 j A » r*us process then returnuig to step 1200 as described above, 

then process the proceeds to step 912 as described above. 30 or 

Otherwise, a determination is made as to whether the Referring again to step 1206, if the service is the first 

component mutex is present (step 924). If the component servicc m shutdow ° sequence, the shutdown sequence is 

mutex is present, the process then proceeds to step 916 as updated (step 1216). The shared memory is then unlocked 

described above. Otherwise, the process terminates. ( stc P m8 >' md ^ callback shutdown function is invoked 

Turning now to FIG. 10, a flowchart of a process for 35 ? P ^ * e daem ° D com P onent mu u tex ^ rele !f d 

creating a standby cluster coordinator daemon is depicted in ( f P ^ 22) Wlth th ? pWCQSS „ tenn i natm g thereafter This 

accordance with a preferred embodiment of the present s ^ r u ed memor y ^ chan u 1S ^ ' an ° rderLy sh ? tdown 

invention. The process begins by spawning the standby °f the servi ? S * ^ shutdown may occur even when the 

cluster coordinator daemon (step 1000). When the standby cluster coorchnator daemon fails. The daemon component 

cluster coordinator daemon is spawned, a thread is created 40 T ™ * i J ,? ^ ordl ^ tor daemon ^ the 

to monitor the cluster coordinator daemon (step 1002). The cluster daemon releases thls mutex or faJs - 

monitoring is performed through the use of a daemon Wlth reference again to step 1206, if the first service is 

standby mutex. If the cluster coordinator daemon fails, the stiU present, the process then proceeds directly to step 1210. 

standby cluster coordinator daemon becomes the cluster Wlth reference again to step 1208, if the current service is 

coordinator daemon 45 next one 011 tne shutdown sequence, the process then 

With reference now to FIG. 11, a flowchart of a process P roceeds to ste P 1216 ™ described above - 

for a standby cluster coordinator daemon for shutting down Wlth reference now to FIGS. 13A-13B, are illustrations 

and restarting services is depicted in accordance with a of entries in a configuration file are depicted in accordance 

preferred embodiment of the present invention. This process a preferred embodiment of the present invention, 

is used in the recovery state when a cluster coordinator so Configuration file 1300 includes a cluster services configu- 

daemon has failed. The process begins by detecting a cluster ration table 1302 > a dae mon configuration table 1304, and a, 

coordinator daemon failure (step 1100). The failure is manager configuration table 1306. Cluster services configu- 

detected when the standby cluster coordinator daemon is ration lable 1302 includes entries for the various services 

able to acquire the daemon standby mutex, which is nor- that are registered. The entries include an identification 1308 

mally held by the cluster coordinator daemon. In response, 55 for the service > a lock name 1310, executable name 1312 : 

the registered services are shutdown (step 1102). The shut- start U P ^ out 1314 > and a shutdown time out 1316. ^ 

down in step 1102 uses the processes described in FIG. 8. Lock name 1310 is the name of the component mutex 

The standby cluster coordinator daemon wakes from the used to detect component process status. Executable name 

suspended state by detecting the failure of the cluster 1312 is the name of the program used to start the service, 

coordinator daemon using the daemon standby mutex. When 60 Start up time out 1314 is the maximum time that will be used 

the last service shuts down, the standby cluster coordinator to start up the service, and shutdown time out 1316 is the 

daemon becomes the primary cluster coordinator daemon. maximum time that will be allowed to shutdown the service. 

The standby cluster coordinator daemon becomes the cluster Both time out values are in milliseconds in the depicted 

coordinator daemon (step 1104), and restarts the services example. 

(step 1106). The restarting of services uses the process 65 Cluster services configuration table 1302 includes infor- 

described in FIG. 8. In addition, a new standby cluster mation used by the cluster coordinator daemon. These 

coordinator daemon is spawned (step 1108). entries includes: executable name 1318, lock name 1319, 
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startup sequence 1320, shutdown sequence 1322, startup 
time out 1324, shutdown time out 1326, restart retry counter 
1328, and restart time span 1330. 

Executable name 1318 is the program used to start the 
cluster coordinator. Startup sequence 1320 is the start up 5 
sequence for starting the services while shutdown sequence 
1322 is the sequence used to shutdown services. As can be 
seen, these two sequences may vary. Startup time out 1324 
is the maximum amount of time allowed to startup all of the 
services and shutdown time out 1326 is the maximum 10 
amount of time allowed to shutdown all of the services. 
Restart retry counter 1328 is used to determine the maxi- 
mum number of retries that will be made in starting up the 
services. Restart time span 1330 indicates the amount of 
time that passes before the number of retries are automati- 15 
cally reset. 

Cluster services configuration table 1302 also includes the 
name of the daemon mutex 1332, name of the daemon 
component mutex 1334, and the name of the semaphore 
1336 used by the cluster coordinator daemon. Semaphore 20 
1336 is the communication channel used between the cluster 
coordinator daemon and the services. 

Manager configuration table 1306 includes lock shared 
name 1338, manager name 1340, semaphore 1342, and 
memory shared name 1344. 25 

Lock shared name 1338 is the shared memory name. 
Manager name 1340 is the name of the manager mutex. 
Semaphore 1342 is the communications channel between 
the cluster coordinator manager and the cluster coordinator 
daemon processes. Memory shared name 1344 is the shared 30 
memory mutex used by all components and services to 
access the shared memory. 

It is important to note that while the present invention has 
been described in the context of a fully functioning data 
processing system, those of ordinary skill in the art will 
appreciate that the processes of the present invention are 
capable of being distributed in a form of a computer readable 
medium of instructions and a variety of forms and that the 
present invention applies equally regardless of the particular 
type of signal bearing media actually used to carry out the 
distribution. Examples of computer readable media include 
recordable-type media such a floppy disc, a hard disk drive, 
a RAM, and CD-ROMs and transmission-type media such 
as digital and analog communications links. ^ 

The description of the present invention has been pre- 
sented for purposes of illustration and description, but is not 
limited to be exhaustive or limited to the invention in the 
form disclosed. Many modifications and variations will be 
apparent to those of ordinary skill in the art. The embodi- 5Q 
ment was chosen and described in order to best explain the 
principles of the invention the practical application and to 
enable others of ordinary skill in the art to understand the 
invention for various embodiments with various modifica- 
tions as are suited to the particular use contemplated. 55 

What is claimed is: 

1. A method for controlling a set of services in a cluster 
computer system, the method comprising the computer 
implemented steps of: 

registering the set of services with a service controller in 6Q 
the cluster computer system; 

monitoring the set of services for a failure of a service 
within the set of services; 

responsive to the failure of the service, initiating a shut- 
down sequence; and 6S 

initiating an appropriate start up sequence, wherein the 
shutdown sequence and start up sequence are ordered 
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lists of a plurality of services of the set of services, and 
wherein the shutdown sequence and start up sequence 
are stored in a data structure. 

2. The method of claim 1, wherein the sets of services are 
monitored by a cluster coordinator daemon. 

3. The method of claim 1, wherein the cluster computer 
system includes a shared memory, wherein the shutdown 
sequence is stored within the shared memory and wherein 
the step of initiating a shutdown sequence includes allowing 
the set of services to access the shared memory to determine 
which services within the set of services should be shutdown 
using the shutdown sequence. 

4. The method of claim 3, wherein the appropriate start up 
sequence is stored in the shared memory and wherein a 
cluster coordinator daemon initiates starting the set of ser- 
vices using the start up sequence. 

5. The method of claim 1, wherein a daemon component 
mutex is used to shutdown services identified in the shut- 
down sequence and start up services identified in the start up 
sequence. 

6. A method for managing a set of services in a cluster 
computer system, the method comprising the computer 
implemented steps of: 

monitoring the set of services for a failure of a service 

within the set of services; 
responsive to detecting a failure of a service within the set 

of services, initiating a shutdown of the set of services 

in an order required to properly shutdown the set of 

services; and 

restarting the set of services in an order required for 
proper operation of the set of services, wherein the set 
of services are shutdown in accordance with a shut- 
down sequence and the set of services are restarted in 
accordance with a start up sequence, the shutdown 
sequence and start up sequence being ordered lists of 
the set of services, and wherein the shutdown sequence 
and start up sequence are stored in a data structure. 

7. The method of claim 6 further comprising starting the 
set of services in a selected order. 

8. The method of claim 7, wherein the set of services are 
started in the selected order by a cluster coordinator daemon. 

9. A cluster computer system having a set of services, the 
cluster computer system comprising: 

registration means for registering the set of services with 
a service controller in the cluster computer system; 

monitoring means for monitoring the set of services for a 
failure of a service within the set of services; 

first initiating means, responsive to the failure of the 
service, for initiating a shutdown sequence; and 

second initiating means for initiating an appropriate start 
up sequence, wherein the shutdown sequence and start 
up sequence are ordered lists of a plurality of services 
of the set of services, and wherein the shutdown 
sequence and start up sequence are stored in a data 
structure. 

10. The cluster computer system of claim 9, wherein the 
sets of services are monitored by a cluster coordinator 
daemon. 

11. The cluster computer system of claim 10, wherein the 
cluster computer system includes a shared memory, wherein 
the shutdown sequence is stored within the shared memory 
and wherein the step of initiating a shutdown sequence 
includes allowing the set of services to access the shared 
memory to determine which services within the set of 
services should be shutdown using the shutdown sequence. 

12. The cluster compute system of claim 11, wherein the 
appropriate start up sequence is stored in the shared memory 
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and wherein a cluster coordinator daemon initiates starting 
the set of services using the start up sequence. 

13. A cluster computer system having a set of services, the 
cluster computer system comprising: 

monitoring means for monitoring the set of services for a 5 
failure of a service within the set of services; 

first initiating means, responsive to detecting a failure of 
a service within the set of services, for initiating a 
shutdown of the set of services in an order required to 
properly shutdown the set of services; and 10 

restarting means for restarting the set of services in an 
order required for proper operation of the set of 
services, wherein the set of services are shutdown in 
accordance with a shutdown sequence and the set of s 
services are restarted in accordance with a start up 
sequence, the shutdown sequence and start up sequence 
being ordered lists of the set of services, and wherein 
the shutdown sequence and start up sequence are stored 
in a data structure. 2Q 

14. The cluster computer system of claim 13 further 
comprising second starting means for starting the set of 
services in a selected order. 

15. The cluster computer system of claim 14, wherein the 
set of services are started in the selected order by a cluster ^ 
coordinator daemon. 

16. A computer program product in a computer readable 
medium for managing a set of services in a cluster computer 
system, the computer program product comprising; 

first instructions for registering the set of services with a 30 

service controller in the cluster computer system; 
second instructions for monitoring the set of services for 

a failure of a service within the set of services; 
third instructions, responsive to the failure of the service, 

for initiating a shutdown sequence; and 35 
fourth instructions for initiating an appropriate start up 

sequence, wherein the shutdown sequence and start up 
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sequence are ordered lists of a plurality of services of 
the set of services, and wherein the shutdown sequence 
and start up sequence are stored in a data structure. 

17. A method of managing a set of services in a cluster 
computer system, comprising: 

registering the set of services with a cluster coordinator; 
writing, to a shared memory, a configuration file having a 

start up sequence identifying an ordered list of services 

of the set of services; 
spawning the cluster coordinator daemon process to 

monitor the set of services for a failure of a service 

within the set of services; 
starting services in the set of services using the cluster 

coordinator daemon process based on the start up 

sequence; and 

monitoring the set of services using the cluster coordina- 
tor daemon process. 

18. The method of claim 17, further comprising spawning 
a standby cluster coordinator daemon process. 

19. The method of claim 17, wherein monitoring the set 
of services further includes: 

determining if a service fails; 

providing the services in the set of services access to the 

shared memory; and 
shutting down the services in the set of services based on 

a shutdown sequence stored in the shared memory. 

20. The method of claim 19, wherein shutting down the 
services in the set of services includes: 

determining if a mutex for the cluster coordinator is 
present in the shared memory; 

shutting down the cluster coordinator daemon process, 
then the standby cluster coordinator, and then the 
services if the mutex is present in the shared memory. 

***** 
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